Speaker and channel-normalized set of formant parameters for telephone speech recognition

نویسندگان

Boris Lobanov

T. Levkovskaya

Igor E. Kheidorov

چکیده

The speech parameters, most commonly used nowadays, are Cepstral coefficients derived from FFT or LPC Spectrum. An alternative approach that can potentially provide maximum speaker and channel independence is estimation of articulatory based features such as formant frequencies, amplitudes and voicing degree. A present report describes a new method and algorithm of robust estimation of F1(t), F2(t), F3(t), A1(t),A2(t), A3(t), V(t) from telephone speech signal, and also the procedures of their normalization against speaker and channel variability. The results obtained from the experiments confirm the efficiency of the suggested set of formant parameters in a view of speech signal speaker – and channel variability resistance. According to the experiments it gives significant improvement in the recognition performance as compared with cepstral parameters use.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies

Speech scientists often have to work with speech signals that have been transmitted over the telephone. Although the acoustic properties of telephone transmission such as the band-pass filter characteristics are well known, little attention has been paid to their effect on the measurement of speech parameters. This study deals with artefacts introduced by the lower cut-off slope of the transmis...

متن کامل

Speaker normalized spectral subband parameters for noise robust speech recognition

This paper proposes speaker normalized spectral subband centroids (SSCs) as supplementary features in noise environment speech recognition. SSCs are computed as frequency centroids for each subband from the power spectrum of the speech signal. Since the conventional SSCs depend on formant frequencies of a speaker, we introduce a speaker normalization technique into SSC computation to reduce the...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language

Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...

متن کامل

Automatic speech recognition of co-channel speech: integrated speaker and speech recognition approach

This paper presents a novel Bayesian approach to the problem of co-channel speech. The problem is formulated as the joint maximization of the a posteriori probability of the word sequence and the target speaker given the observed speech signal. It is shown that the joint probability can be expressed as the product of six terms: a likelihood score from a speaker-independent speech recognizer, th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Speaker and channel-normalized set of formant parameters for telephone speech recognition

نویسندگان

چکیده

منابع مشابه

Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies

Speaker normalized spectral subband parameters for noise robust speech recognition

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language

Automatic speech recognition of co-channel speech: integrated speaker and speech recognition approach

عنوان ژورنال:

اشتراک گذاری